Psychological Review — Latest Matching Preprints

1

Nonlinear influence of reward volatility on arbitration between multiple learning strategies reflects cost-benefit optimization

Yamada, T.; Samejima, K.

2026-06-19 animal behavior and cognition 10.64898/2026.06.15.732293 medRxiv

Top 0.1%

7.8%

Show abstract

Action selection involves two systems: a model-free reinforcement learning strategy, which relies on experience with action-outcome pairs, and a model-based reinforcement learning strategy, which enables more flexible behavior via inference using a model of the invariant environmental structure. Although environmental change requires more flexible behavior, the ability of volatility, a higher-order statistic that captures how rapidly or frequently the environment changes, to systematically modulate these strategies remains unclear. We examined the effects of reward volatility on arbitration between model-free and model-based reinforcement learning strategies using two modified two-step decision tasks. In Experiment 1, participants performed tasks with different levels of reward volatility and time pressure. In Experiment 2, we systematically manipulated reward volatility across a broader range to assess the relationship between volatility and learning strategy. Behavioral data were analyzed using model-agnostic one-trial and multitrial back analyses, reinforcement learning simulations, and hierarchical Bayesian model fitting. Across experiments, reward volatility exerted an inverse U-shaped nonlinear effect on the arbitration between model-free and model-based reinforcement learning strategies, as the model-based learning strategy was strongly driven at intermediate levels of reward volatility. These modulation effects were observed only in individuals who had learned the transition structure in the task, whereas those who had not learned the transition structure relied on the model-free learning strategy regardless of reward volatility. Reinforcement learning simulations revealed that the relative advantage of the model-based learning strategy over the model-free learning strategy peaked at intermediate levels of reward volatility. Additionally, increased time pressure shifted behavior toward the model-free learning strategy. These results demonstrated that, humans do not always use the model-based reinforcement learning strategy in uncertain and dynamic environments, even when they are aware of the task structure, supporting cost-benefit optimization. Author SummaryThe ability to flexibly guide behavior by carefully considering future consequences is fundamental to a prominent property of human intelligence and rationality. However, what drives this deliberative system? In this study, we investigated the factors that promote deliberative versus habitual behavior using decision-making tasks with uncertain structures and changing rewards. We found that participants who spontaneously learned the hidden transition structure in the task used this knowledge to guide deliberative behavior. Conversely, participants who did not learn the structure relied primarily on habitual strategies, repeating actions that had previously been rewarded. Among participants who learned the structure, the degree of deliberative behavior changed nonlinearly with reward volatility, in which the speed at which rewards changed over time. We also observed that limiting the decision time reduced deliberative behavior and promoted habitual responding. These findings suggest that under uncertain and dynamic environments, deliberative control is adaptively regulated according to cost-benefit optimization. Our results contribute to understanding how humans flexibly adjust their behavioral control systems in response to environmental conditions.

2

A computational account of how positive performance bias supports cognitive effort

Mori, K.; Yamada, M.

2026-05-18 neuroscience 10.64898/2026.05.13.725021 medRxiv

Top 0.1%

6.4%

Show abstract

The willingness to exert cognitive effort is essential but is constrained by the subjective cost of effort. Although effortful tasks are often avoided, positive bias about ones own performance may help sustain engagement with cognitive demands. Here, participants completed an effort-based decision-making task and reported trial-by-trial predictions of their own performance, allowing us to quantify performance prediction error (PPE) as the discrepancy between subjective and objective accuracy. The results showed that PPE was predominantly positive and increased with effort level, indicating greater overestimation under higher cognitive demands. Using a computational model, we show that choices were best explained by a learning model in which rewarded trials accompanied by positive PPE decreased subsequent sensitivity to effort. A confidence-based control model did not provide a better account of choices, suggesting that this effect was better captured by positive performance bias than by confidence alone. Our findings provide a computational account of how biased self-evaluation may attenuate the subjective cost of cognitive effort and extend the positive bias literature to the task need for cognitive effort.

3

Transitive reasoning as linear classification

Ferrera, V. P.; Lippl, S.; Kay, K.; Munoz, F.; Jin, Y.; Jensen, G.; Terrace, H.

2026-06-28 neuroscience 10.64898/2026.06.24.734346 medRxiv

Top 0.1%

6.2%

Show abstract

Transitive inference (TI) is the ability to reason about transitive relationships in an ordered set of items (e.g., if A>B and B>C, then A>C). TI is widely held to depend on a linear representation of the serial (rank) order of those items. By what computational mechanism is such an ordering constructed during learning, and how is it used to make choices that obey transitivity? Here we take a minimalist approach, applying least-squares estimation (LSE) to a serial learning task commonly used to test TI in humans and animals. In this formulation, LSE computes a linear classifier that maps task conditions onto behavioral outcomes. This algorithm makes no explicit assumptions about transitivity or serial order, yet it reproduces key empirical features of TI; namely, the ability to generalize beyond the training set, and a symbolic distance effect (SDE) in performance accuracy. Applying the classifier to individual items produces an internally ordered representation of rank from which both generalization and the SDE naturally emerge. The approach also yields a decision mechanism, in the form of a differencing operation, for selecting the correct item from any pair. These findings reframe TI as a linear classification problem, challenging conventional assumptions about the cognitive mechanisms required for transitive reasoning.

4

Recent history attracts and repels perceptual decisions depending on surprise

Kaltenmaier, A.; Press, C.

2026-06-30 neuroscience 10.64898/2026.06.25.734467 medRxiv

Top 0.1%

5.5%

Show abstract

Past sensory experience shapes our perceptual decision-making in the now. Popular models frame perceptual decisions as either attracted towards or repelled away from recent sensory information, but it is unclear when and why these distinct effects emerge. We here ask whether effects turn from attractive to repulsive depending on the level of surprise elicited by the precision-weighted discrepancy between past and present sensory states. This model is based upon the idea that attraction is adaptive for optimizing efficiency and accuracy when discrepancies are small, because they likely reflect sensory noise rather than real change in the environment. In contrast, repulsion may reflect the upweighting of counterfactual evidence when discrepancies are large because they more likely signal the need for model updating. We test this model on a large amount of recently-collated trial-by-trial serial dependence data and consistently find support for it across the dataset, participant, and trial-by-trial level. Specifically, serial dependence effects are attractive at low discrepancies between past and current sensory states but turn repulsive when discrepancies are larger. Higher sensory precision is found to accelerate this flip by reducing the modal discrepancy threshold required to trigger repulsion effects. We discuss how these findings necessitate extending existing theories of serial dependence, and how they may resolve conflicts in the broader predictive processing, learning and perception literatures.

5

A neural network model of free recall learns multiple memory strategies

Li, M.; Jensen, K. T.; Zhang, Q.; Lu, Q.; Mattar, M. G.

2026-07-06 neuroscience 10.1101/2025.09.25.678592 medRxiv

Top 0.1%

5.4%

Show abstract

Humans exhibit structured patterns of memory recall, including a tendency to recall more recent information and to recall events in the same order they were experienced. Classic computational models explain these patterns by positing that memories incorporate the ongoing ''temporal context'', formed by smoothly integrating the stimulus history. However, it is unclear whether a single mechanism can account for the full repertoire of human memory strategies, as the optimal approach may be task-dependent. For example, human memory experts widely apply the ''memory palace'' strategy, which is empirically better but not captured by temporal context models. Here we show that neural networks optimized for free recall develop diverse retrieval strategies, with only some of them resembling temporal context models.The best-performing models discovered a stimulus-invariant index code that emphasizes the studied position of each list item, instead of its temporal context. This creates a stable scaffold for forward recall akin to the memory palace technique. This index code was more likely to emerge when networks were i) encouraged to recall all studied items rather than prioritizing a few items, and ii) prevented from relying on recency, resonating with human data. Our findings demonstrate that human-like recall patterns can arise from multiple distinct computational mechanisms, and that sequential retrieval using item index is an optimal strategy that explains expert-level recall performance.

6

What group averages conceal: functional heterogeneity in human eyeblink habituation

Perez, O. D.; Cancino, N.; Hermosilla, D.; Soto, F. A.; Vogel, E. H.

2026-06-25 animal behavior and cognition 10.64898/2026.06.21.733594 medRxiv

Top 0.1%

5.1%

Show abstract

In animal learning research, learning is often represented by plotting a behavioral measure as a function of training trials. A particularly clear case is habituation, a basic form of learning in which repeated presentation of a stimulus produces a decrement in responding. Although retention tests provide the strongest basis for evaluating durable habituation once short-lived performance effects have dissipated, the pattern of response change across stimulus repetitions, or habituation curve, remains theoretically and empirically relevant because it is used to characterize determinants of habituation, individual and clinical profiles, and functional forms, including linear, curvilinear, asymptotic, and mixed incremental-decremental patterns of responding. However, group averaged curves may conceal substantial individual heterogeneity. Here, we analyzed archived human eyeblink habituation data from 157 participants to ask whether the curve shape selected for the group average reflects the curve shapes observed at the individual level. Five candidate functions were fitted separately to each participant and to the corresponding group average. No single function characterized most individuals. More importantly, the model selected for the group average differed from the most frequent individual model in all four groups. When data were pooled across groups, the average favored a dual-process form, a shape that matched the individual plurality in none of them. Simulation analyses showed that averaging heterogeneous individual trajectories can itself produce a group curve that favors a more complex model. Our findings show that group averaged habituation curves should not be treated as direct descriptions of the typical individual trajectory.

7

Suboptimal human inference reflects an efficient and flexible information bottleneck

Parker, J. A.; Filipowicz, A. L. S.; Li, K.; Balasubramanian, V.; Kable, J. W.; Gold, J. I.

2026-06-11 neuroscience 10.64898/2026.06.10.731461 medRxiv

Top 0.1%

3.9%

Show abstract

Human decision-making behavior varies widely across individuals and task conditions. This variability is often interpreted in terms of different suboptimal decision strategies, but the principles that govern these suboptimalities remain poorly understood. We propose that some of these suboptimalities can be understood in terms of limited-capacity, but information-efficient, inference processes that inform decision-making. We developed and used new theoretical and empirical approaches to compare the amount of information used (capacity) to the effectiveness with which it was used (accuracy) by individual participants performing simple inference tasks. Variable, suboptimal performance was explained largely by inference that had variable, limited information capacity. Across these capacity limits, and regardless of whether the inference strategy was based on optimal or heuristic principles, the information was used effectively to maximize accuracy for a given capacity. This form of flexible and efficient information bottleneck reflects fundamental capacity-accuracy tradeoffs that structure individual variability.

8

Confirmation Bias Exists in the Face of False Information

Razi, H.; Sambrook, T.; Garrett, N.

2026-05-11 neuroscience 10.64898/2026.05.07.723487 medRxiv

Top 0.1%

3.5%

Show abstract

Confirmation bias impacts judgments and decisions across a range of domains including finance, policy and science. Here we examine whether explicitly labelling information as true or false disrupts a core underlying computational mechanism that can generate this pervasive bias - asymmetric learning. Human participants (Study 1: N=47; Study 2: N=57) completed a 2 alternative forced choice (2AFC) task previously used to test for the presence of confirmation bias. Participants made choices between pairs of options that could win or lose money and received either factual or counterfactual feedback after each choice. We introduced a key novel feature into the task - providing explicit cues that signalled to participants whether feedback they had seen was true (verified) or false (debunked). Learning in response to feedback was attenuated under false compared to true labels but was present under both. Fitting participants choices to computational models enabled us to examine how sensitivity to the feedback varied as a function of both the label (true/false) and confirmation (confirmatory/disconfirmatory). This revealed a distinct pattern of learning rates typical of confirmation bias (enhanced learning from positive prediction errors for chosen options and from negative prediction errors for unchosen options) in response to both true and false labels. The findings highlight how confirmation bias plays an important role in the effectiveness of interventions designed to verify true and/or debunk false claims. Verification is less likely to succeed when information disconfirms prior beliefs. Conversely, debunking false claims is unlikely to succeed when the information confirms ones prior beliefs.

9

The Metacognitive Sensitivity of Verbal and Numerical Confidence Reports

Zylberberg, A.; Alvarez Heduan, F.

2026-05-18 animal behavior and cognition 10.64898/2026.05.13.724887 medRxiv

Top 0.1%

3.1%

Show abstract

We study how confidence in perceptual decisions depends on whether it is communicated verbally (e.g., "very likely") or numerically (e.g., "80% certainty"). We find that verbal expressions more reliably distinguish correct from incorrect choices than numerical reports, challenging the common assumption that numerical probabilities provide more precise representations of uncertainty. Additionally, in a dyadic decision-making task in which participants can revise their initial reports based on a partners choice and expressed confidence, verbal and numerical reports are equally effective in supporting accurate revisions of initial judgments. Together, these results underscore the effectiveness of verbal expressions as a means of conveying decision confidence.

10

Dynamic Modulation of Distractor Suppression by Tonic and Trial-Level Alertness Fluctuations: A Pupillometric Study

Chen, S.; Mueller, H. J.; Shi, Z.

2026-06-29 neuroscience 10.64898/2026.06.24.733323 medRxiv

Top 0.1%

2.7%

Show abstract

Attentional control balances proactive suppression of predictable distractors with reactive suppression of unexpected ones. Yet, how internal states such as alertness shape this balance is unclear. Using pupillometry and eye tracking across two probability-cueing experiments (conducted in 2024) with varying distractor prevalence, we distinguished tonic (baseline pupil size across blocks) from trial-level pupil size fluctuations (trial-by-trial residual variability in pre-stimulus pupil size). With moderate prevalence, suppression of frequent-region distractors developed gradually, whereas high prevalence induced near-immediate suppression. Behavioral measures (e.g., reaction times) were closely linked to tonic and trial-level pupil size fluctuations. Critically, both alertness components jointly influenced control: during early learning, heightened trial-level pupil size increased distractor capture and reduced target fixations, whereas later on, suppression shifted to a proactive mode resilient to trial-level fluctuations. Under high prevalence, this shift occurred faster. Notably, higher trial-level pupil size generally accelerated first target selection. These findings show that tonic alertness and trial-level alertness fluctuations dynamically regulate reactive and proactive control during statistical learning. Impact StatementThis study shows that people become better at ignoring predictable distractions over time, but that this improvement depends not only on what they have learned about the task environment, but also on their current level of alertness. By combining eye tracking and pupil measures, we found that temporary increases in alertness can sometimes help people orient more quickly to relevant information, yet during earlier stages of learning they can also make attention more vulnerable to distracting events. These findings suggest that successful focus in complex environments depends on a dynamic interplay between learned expectations and moment-to-moment fluctuations in mental state, with implications for understanding sustained attention in settings such as monitoring, driving, and other tasks that require people to stay engaged while resisting distraction.

11

Precision-Weighted Updating Explains Serial Dependence Across Sensory and Contextual Transitions

Qu, C.; Shi, Z.

2026-06-10 neuroscience 10.64898/2026.06.06.730048 medRxiv

Top 0.1%

1.7%

Show abstract

Serial dependence is influenced by sensory uncertainty and contextual continuity, but it remains controversial whether these influences reflect separate mechanisms or different expressions of a shared updating process. Across two time reproduction experiments (N = 44), we examined how motion coherence and coherence transitions modulated the attraction of recent temporal history while controlling for central tendency effects from the current stimulus. In Experiment 1, the low coherence led to stronger serial dependence compared to the high coherence. In Experiment 2, enhanced coherence categories introduced salient contextual boundaries; serial dependence was markedly stronger on the same category transition than switch transition. A three-state Kalman filter model, comprising fast (serial dependence), slow (central tendency), and bias (decision carryover) states captured these patterns through coherence-dependent modulation of fast-state process noise and Kalman gain. Within the tested model space, this precision-weighting account was selected in both experiments; with little evidence that an explicit state reset was needed. These findings support the precision-weighted updating account in which recent history is weighted according to the reliability and stability of the current perceptual environment.

12

Within-Trial Noise Accounts for Inhibition of Return

Seidel Malkinson, T.; Bourgeois, A.; Wattiez, N.; Chica, A. B.; Pouget, P.; Bartolomeo, P.

2026-05-08 neuroscience 10.64898/2026.05.05.722974 medRxiv

Top 0.1%

1.6%

Show abstract

Inhibition of return (IOR) refers to the slowing of response times (RTs) for stimuli presented at previously inspected locations relative to novel locations. However, the exact processing stage(s) at which IOR occurs, and its nature across different response modalities, remain debated. By reanalyzing RT data from a target-target IOR paradigm with a single noisy accumulator model, we tested whether IOR could occur at sensory or attentional stages of processing, or at later stages of decision and action selection. We considered IOR under two conditions: manual and saccadic responses. The within-trial Gaussian noise parameter best explained both manual and saccadic IOR, suggesting that in both modalities, IOR may result from a more fluctuating accumulation of evidence for repeated locations. These results support the hypothesis that target-target IOR may primarily involve attentional-level mechanisms. Significance statementWe respond more slowly to a stimulus that is presented within a short interval in the same location ("inhibition of return"), a bias thought to promote efficient visual exploration. Using evidence-accumulation modeling of manual and eye-movement reaction times from two previous studies, we found that the key change linked to inhibition of return is greater within-trial variability (noise) in evidence accumulation, not a higher decision threshold. Understanding which processing stage is affected can help connect behavioral effects to the brain networks that support attention and orienting.

13

Task-space dimensions guide human exploration in complex environments

An, J.; Hu, J.; Wu, Y. E.; Ning, S.; Liu, C.; Pan, Y.; Zhu, F.; Wang, R.; Ji, N.

2026-05-04 animal behavior and cognition 10.64898/2026.04.29.720265 medRxiv

Top 0.1%

1.4%

Show abstract

Humans frequently make decisions in complex, high-dimensional environments, where identifying task-relevant information is critical for rapid behavior optimization. Humans outperform standard reinforcement learning agents in navigating such complexity, yet the cognitive strategies of humans remain unclear. To address this, we developed a novel multi-dimensional learning task in which only a subset of dimensions is reward-related. Crucially, unlike prior studies, subjects are uninformed of the true task dimensionality and have to identify them through exploration. This design closely mimics the ambiguity in real-world tasks. Our results have identified two stereotyped choice patterns that reveal "dimension-guided" strategies in exploration and exploitation. Cross-subject analyses suggest that dimension-guided exploration may promote the efficiency of reward-based learning. These findings indicate that humans leverage task dimensionality to guide exploration, and provide inspiration for improving exploration efficiency in AI agents.

14

Serial dependence in duration perception reveals reliability-weighted updating of the prior

Otsuka, T.; van Rijn, H.; Kruijne, W.; de Jong, J.

2026-06-04 neuroscience 10.64898/2026.06.01.729229 medRxiv

Top 0.1%

1.4%

Show abstract

Bayesian theories of perception propose that perceptual estimates result from the integration of prior beliefs ("priors") with sensory input ("likelihood"), weighted by their reliability. While Bayesian theories assume that priors are continuously updated over time, empirical evidence for such reliability-weighted updating modulating sequential percepts remains lacking. Here, we leverage a behavioral phenomenon called serial dependence--in which perceptual judgments are attracted toward previous stimuli--to test a central prediction of Bayesian theories: that perceptual estimates are sequentially updated according to the reliability of successive stimuli. We used a duration reproduction task in which the reliability of perceived duration was manipulated via signal-to-noise ratio by embedding stimuli in dynamic white noise. Consistent with the prediction of Bayesian theories, serial dependence in perceived duration was enhanced by increased reliability of previous stimuli and attenuated by increased reliability of current stimuli. Computational modeling revealed that changes in sensory noise (i.e., the width of the likelihood) can account for the reliability-dependent modulation of serial dependence. These findings provide empirical evidence for reliability-weighted updating, supporting a central prediction of Bayesian theories that prior information and sensory input are iteratively integrated to calibrate perceptual estimates.

15

Diurnal rhythms of choice: a novel state-dependent drift diffusion model uncovers time-dependent changes in rat decision making

Senne, R. A.; Xia, H.; Duebel, H. F.; Do, Q.; Kane, G.; Fourie, J.; Ramirez, S.; Scott, B.; DePasquale, B.

2026-05-28 animal behavior and cognition 10.64898/2026.05.25.727672 medRxiv

Top 0.2%

1.1%

Show abstract

2Time-of-day severely impacts human decision-making, with real-world consequences. Studying shifts in decision-making strategy requires controlled, long timescale behavioral measurement and analyses that can extract insight from time-varying behavior. We introduce two complementary advances to address this gap: an autonomous 24-hour training facility for continuous behavioral measurement during decision-making and an interpretable modeling framework that captures non-stationary decision dynamics from reaction times and choices. Rats were trained on a visual evidence accumulation task across months, generating over a half million trials spanning the circadian period. Our model revealed latent behavioral states characterized by distinct evidence accumulation parameters, including differences in drift rate, bias, and decision-commitment time. These states recur across days and align with feeding schedules and the light-dark cycle, producing periodic fluctuations in performance over 24 hours. Together, these results demonstrate how continuous behavioral sampling combined with generative modeling uncovers long-timescale structure in decision-making obscured by stationary analyses. 1 HIGHLIGHTSO_LI24-hour live-in operant system allows autonomous training in cognitive tasks across months C_LIO_LI24-hour measurements reveal that rat performance fluctuates with time of day C_LIO_LINovel DDM-HMM framework identifies reaction time and accuracy shifts across multiple timescales C_LIO_LIDDM-HMM captures serial dependence in decisions that classic models ignore C_LI

16

Optimal Practice Schedules in a Dual-Rate Model of Motor Adaptation, and Their Recovery by Reinforcement Learning

Jeter, R.; Todorov, D.; Molkov, Y.

2026-06-22 neuroscience 10.64898/2026.06.17.732970 medRxiv

Top 0.2%

1.0%

Show abstract

A clinician guiding a stroke patient through a 45-minute rehabilitation session, a coach planning a training day, a teacher choosing the order of practice problems, they all face the same question: "given everything practiced so far, what should the next trial be?" The motor-learning literature offers two coarse answers, blocked and interleaved ("random") practice, with a well-known dissociation, blocked practice gives faster acquisition but worse retention, while interleaved practice gives the opposite. We argue that this dissociation is not a fixed property of practice schedules but a shadow of a richer structure. In particular, for a learner whose memory has a fast shared component and slower context-specific components, the best schedule should be a function of the learners current internal state and the time remaining before the retention probe. We make this precise in a minimal two-context fast-slow learner model whose optimal schedules can be computed exactly for short sessions and approximated by a structured beam-search upper bound for longer ones. The optimal schedule is not blocked, not interleaved, and not a single rule; it is a family of schedules determined by how much retention is weighted relative to acquisition. The family has three regimes (alternating, mixed, blocked-with-late-correction) and for long sessions, the optimal schedule has an interpretable structure -- exploit one context, repair the neglected one, then interleave to lock in retention. We then investigate whether a reinforcement-learning teacher, observing only the learners actions and errors without access to their internal memory states, can learn these optimal policies from interaction alone. Comparing these learned policies against the exact optima, we show that a model-free agent (PPO) recovers the short-horizon schedules and the long-horizon block-repair-interleave motif in the intermediate regime, but the benchmark also exposes a sharp failure in the acquisition-dominated regime, where PPO collapses to pure blocking and misses a sparse terminal correction. A warm-start diagnostic shows this failure is a genuine metastability of policy gradients rather than a tuning artifact, with blocked-plus-switch and pure-blocked acting as competing attractors that PPO cannot stabilize between. A hyperparameter sweep over observation history reveals that the agent requires very little behavioral context to plan optimally, demonstrating that partial observability is not a major barrier to finding optimal practice schedules. Finally, we discuss the implications of our framework for motor adaptation and contextual interference, offering practical insights on how instructors can design finite practice sessions to favor long-term retention.

17

Determinants of persistence in sequential effort-based decision-making

Chaigneau, A.; Moretti, R.; Iodice, P.; Pessiglione, M.; Pezzulo, G.

2026-05-14 neuroscience 10.64898/2026.05.11.723817 medRxiv

Top 0.2%

1.0%

Show abstract

Goal-directed behavior often requires sustained effort across a sequence of interdependent decisions, yet the determinants of persistence in such contexts remain poorly understood. Here, we investigated how individuals regulate persistence in a novel sequential effort-based task in which they controlled an avatar through successive checkpoints to reach a final goal and could make repeated attempts following failure. At each attempt, participants could choose either to persist in the same task or to disengage toward an easier but less rewarding alternative. We found that decisions to persist or disengage were jointly shaped by multiple interacting factors. Disengagement increased with task difficulty and lower skill level. It also increased with repeated attempts and time-on-task, indexing fatigue, and with accumulated errors, indexing lack of progress. Conversely, proximity to the goal promoted persistence and shaped decision dynamics by reducing choice conflict during persistence decisions and increasing hesitation during disengagement near the goal. Notably, clearing the first checkpoint produced a sharp increase in persistence, suggesting that early success plays a pivotal role. Furthermore, persistence reflected both retrospective and prospective evaluations of effort, with prior investment promoting commitment and anticipated effort reducing it. Finally, disengagement was preceded by short-term performance decline but not by gradual increases in decision conflict, suggesting relatively abrupt strategy shifts following repeated failures. Together, these findings provide a comprehensive account of persistence in sequential effortful tasks, showing that decisions to persist or disengage are jointly shaped by multiple factors related to fatigue, (lack of) progress, goal proximity, and early success.

18

Preserved geometry during representational drift enables stable perception and memory

Zaid, H.; Schaffer, E. S.

2026-06-28 neuroscience 10.64898/2026.06.25.734656 medRxiv

Top 0.2%

1.0%

Show abstract

In many brain regions, the stimulus tuning of neurons is stable on a timescale of hours but not on a timescale of weeks, a phenomenon often called representational drift. This would seem to imply that these brain regions cannot be used for stable recognition of sensory stimuli or the retrieval of associative memories learned several weeks prior. However, decoding approaches have demonstrated that in some cases, stable decoding of drifting representations is possible. In principle, adaptive decoding provides a plausible resolution to the paradox of how the brain operates with drifting representations, but we lack a deep understanding of what the requirements are for stable decoding to be possible. Here, we offer a general mathematical framework that explains when and why stable decoding from a drifting representation can be achieved. First, we demonstrate that both feedforward and recurrent networks preserve the geometry of their inputs when the network is sufficiently large, meaning that representational drift must also preserve geometry in these networks. Second, we demonstrate that drifting representations that have stable geometry are decodable with adaptive decoders. Therefore, not only the existence of preserved geometry in the presence of representational drift but also the ability to decode from drifting representations simply requires the population of neurons exhibiting representational drift to be large. This theoretical framework not only suggests that preserved geometry should be a general feature of drifting representations, it also explains the conditions under which empirical efforts to measure stable geometry will be successful.

19

Metacognitive Efficiency Reduces Confirmation Bias in Perceptual Decision Making

Perez-Bellido, A.; Moreno-Bote, R.; Fuentemilla, L.

2026-06-23 neuroscience 10.64898/2026.06.18.733181 medRxiv

Top 0.2%

0.9%

Show abstract

Humans exhibit a pervasive drive toward self-consistency, often failing to revise previous decisions even when confronted with contradictory evidence. Here, we investigate the computational mechanisms underlying decision revision in perceptual tasks, examining the regulatory role of metacognition. To do so, we capitalize on a novel paradigm in which participants are repeatedly presented with identical sensory information and allowed to revise their choices after each exposure. Our results reveal that repeated exposure to the same stimulus systematically biases subsequent judgments toward prior responses. Using drift-diffusion modeling, we tested competing explanations incorporating different assumptions about how prior choices affect evidence accumulation. Our findings indicate that consistency biases emerge from asymmetric sensory weighting, selectively amplifying information consistent with previous choices--a phenomenon akin to confirmation bias. Crucially, individuals with higher metacognitive skills exhibited weaker confirmatory biases and more flexible integration of repeated sensory information, enabling greater adaptability in decision-making. These findings highlight the continuous nature of perceptual inference and underscore metacognitions pivotal role in mitigating bias and optimizing decision flexibility.

20

Flexible belief updating drives the childhood advantage in statistical learning

Pesthy, O.; Toth-Faber, E.; Nagy, C.; Nemeth, M.; Janacsek, K.; Nemeth, D.

2026-06-30 neuroscience 10.64898/2026.06.30.735487 medRxiv

Top 0.2%

0.9%

Show abstract

Children often outperform adults in probabilistic statistical learning tasks, yet the mechanisms underlying this developmental advantage remain poorly understood. Here, we used eye-tracking measures of belief updating to examine how children and adults acquire and update predictions in a probabilistic sequence-learning task. Using the standard (oculomotor) reaction time measure, children showed stronger statistical learning than adults, replicating previous behavioral findings while revealing a more detailed profile of developmental differences in statistical learning. Critically, children updated their predictions more frequently: they were less likely to repeat previous predictions and more likely to shift their expectations in response to new input. Adults, in contrast, showed greater persistence, tending to maintain prior predictions even when those predictions were inconsistent with the underlying statistical structure. Despite these pronounced differences in updating behavior, the processing and use of prediction errors were remarkably similar across age groups. These findings indicate that developmental differences in statistical learning do not primarily arise from how prediction errors are computed, but rather from how prior beliefs and incoming information are weighted during belief updating. Children's enhanced learning may therefore reflect reduced reliance on stable priors and greater sensitivity to current sensory evidence, supporting a more exploratory learning strategy. Adults, by contrast, appear to favor an exploitative strategy that stabilizes existing predictions but reduces flexibility in probabilistic environments. More broadly, the results suggest that developmental changes in statistical learning may reflect age-related differences in how readily learners revise their predictions in response to incoming evidence. By integrating sensitive oculomotor measures with analyses that probe the mechanisms underlying belief updating, the present study provides a more fine-grained account of how predictive learning changes across development and offers a framework for reconciling previously inconsistent developmental findings in statistical learning.